HELIX.RData Download:

helix_file_path <- "~/Library/CloudStorage/GoogleDrive-bandov@usc.edu/.shortcut-targets-by-id/1oBvDKkpKxGnEoNogWDoXV--27W2spqKh/HELIX_data/HELIX.RData"
load(helix_file_path)

Metabol_serum.RData Download:

metabol_serum_file_path <- "~/Library/CloudStorage/GoogleDrive-bandov@usc.edu/.shortcut-targets-by-id/1oBvDKkpKxGnEoNogWDoXV--27W2spqKh/HELIX_data/metabol_serum.RData"
load(metabol_serum_file_path)

Adjusting BMI Category

#Binning BMI so that predictive power increases 
phenotype <- phenotype %>%
  mutate(hs_bmi_c_cat = ifelse(hs_bmi_c_cat %in% c(1, 2), 0, 
                               ifelse(hs_bmi_c_cat %in% c(3, 4), 1, hs_bmi_c_cat)))
phenotype

Abstract

This study examines the impact of environmental exposures during pregnancy on birthweight. Using data from the HELIX study, we apply …….. [INSERT MORE STUFF HERE]

Hypothesis:

How do prenatal dietary intake and concentrations of prenatal phthalate exposures influence body mass index (BMI) for children 6-11 years old while controlling for child age at examinations and gestational age at birth, and can urine metabolomics data improve predictive models of BMI using statistical and machine learning tools? [ need to edit]

Introduction

The relationship between prenatal environmental exposures and childhood health outcomes is important, especially in understanding how prenatal factors influence long-term health in children. This data project aims to explore the influence of prenatal dietary intake and phthalate exposure concentrations on body mass index (BMI) in children aged 6 to 11 years. Phthalates, commonly found in plastics and personal care products, are known endocrine disruptors that may impact fetal development and childhood growth patterns (Valvi et al., 2020). Previous studies have shown that prenatal exposure to phthalates is associated with an increased risk of obesity and metabolic disorders in children (Luo et al., 2020). By controlling for key variables such as child age at examinations and gestational age at birth, this project seeks to examine the direct and interactive effects of these prenatal exposures on BMI.

Recent studies emphasize the important role of maternal diet during pregnancy, demonstrating that balanced maternal nutrition can mitigate the adverse effects of environmental exposures like phthalates on child BMI, suggesting that improved dietary practices during pregnancy can lead to better health outcomes for children (NCBI, 2024).

Phthalate exposure has been linked to various health issues, including obesity, type II diabetes, thyroid dysfunction, higher blood pressure, precocious puberty, and reproductive effects. It also impacts the respiratory system (allergy, asthma) and nervous system (delayed neurodevelopment, social impairment) (Serrano et al., 2021).

Furthermore, this project looks at whether urine metabolomics data can improve predictive models of BMI using statistical and machine learning approaches. Integrating metabolomics data helps to understand the biochemical pathways linking prenatal exposures (diet and phthalates) to childhood health outcomes. This project aims to provide insights into the prevention and management of childhood obesity and related health conditions.

Methods

[INSERT MORE STUFF HERE LATER]

Data

The complexity of exposure to environmental contaminants has increased becuase evolving environmental and lifestyle factors. The exposome includes all environmental (non-genetic) exposures an individual encounters from conception through old age. The HELIX (Human Early-Life Exposome) project focuses on the early-life exposome, which integrates all environmental hazards that mothers and children are exposed to and linking these exposures to health, growth, and development risks.

Pregnancy and early childhood are critical times when children are more vulnerable to environmental damage, which can have lifelong effects. Understanding the exposome during these periods can help prevent diseases, since early interventions can change biological foundation and promote healthy development. HELIX cna help show how different environmental exposures together affect health outcomes and risks.

There are six existing European birth cohort studies: Born in Bradford (BiB), Etude des Déterminants pré et postnatals du développement et de la santé de l’Enfant (EDEN), INfancia y Medio Ambiente (INMA), Kaunas Cohort (KANC), Norwegian Mother, Father and Child Cohort Study (MoBa), and Rhea Mother-Child Cohort Study. These cohorts have collected extensive data from national and EU-funded projects. HELIX supplements this data with advanced tools and methods to measure and integrate the chemical, physical, and molecular environment, linking these measurements to child health outcomes.

Smartphones are utilized to measure air pollution, UV radiation, physical activity, and noise exposure. Advanced laboratory techniques identify biological markers of various chemical exposures, such as contaminants in food, consumer products, and water. HELIX has gathered extensive exposome data from mothers and children, making it the largest study on this topic. The study design is multilevel: the first level includes 31,472 mother-child pairs recruited during pregnancy across the six cohorts; the second level consists of a subcohort of 1301 mother-child pairs with detailed measurements of biomarkers, omics signatures, and health outcomes at ages 6-11 years; and the third level involves repeat-sampling panel studies with about 150 children and 150 pregnant women to collect personal exposure data.

This research focuses on a subcohort of 1301 mother-child pairs to explore questions related to environmental exposures, omic data, and their impact on health outcomes. Specifically, the project will examine urine metabolomics data with Body Mass Index (BMI) as the primary outcome of interest. The goal of this project is to provide a deeper understanding of how early-life environmental exposures influence BMI, potentially providing more information on how to apply early intervention and disease prevention.

For more details on the study design see Vrijheid, Slama, et al. EHP 2014. see https://www.projecthelix.eu/index.php/es/data-inventory for more information regarding the study.


Outcome

The primary outcome of interest is birthweight.

Covariates

Covariates include child age, sex, and cohort.

Exposures/Risk Factors

The main exposures of interest diet, contaminats, age, sex, and cohort

Confounders/Interaction

[INSERT LATER]

Model Building

[INSERT LATER]

Model Validation

[INSERT LATER]

Results

[INSERT LATER]

#Discussion

[INSERT LATER]

#Conclusions

[INSERT LATER]

References

  1. Reference 1
  2. Reference 2
  3. Reference 3 [INSERT LATER]

Appendices

[INSERT LATER]

Data Description and Codebook

codebook_file_path <- "~/Library/CloudStorage/GoogleDrive-bandov@usc.edu/.shortcut-targets-by-id/1oBvDKkpKxGnEoNogWDoXV--27W2spqKh/HELIX_data/HELIX.RData"
load(codebook_file_path)
#  Chemicals
filtered_codebook_chemicals <- codebook %>%
  filter(domain == "Chemicals" & 
         family == "Phthalates" & 
         period == "Pregnancy" & 
         variable_name != "hs_sumDEHP_madj_Log2")

# Covariates 
filtered_codebook_covariates <- codebook %>%
  filter(domain == "Covariates" & 
         variable_name %in% c("e3_sex_None", "h_cohort", "hs_child_age_None"))

# Phenotype
filtered_codebook_phenotype <- codebook %>%
  filter(domain == "Phenotype" & 
         variable_name %in% c("hs_bmi_c_cat"))

# Lifestyle
filtered_codebook_lifestyles <- codebook %>%
  filter(domain == "Lifestyles" & period == "Pregnancy" & subfamily == "Diet")

# Combining all the information 
combined_codebook <- bind_rows(
  filtered_codebook_chemicals,
  filtered_codebook_covariates,
  filtered_codebook_phenotype,
  filtered_codebook_lifestyles
)

# Final Display
datatable(combined_codebook, 
          options = list(pageLength = 10, 
                         autoWidth = TRUE, 
                         dom = 'Bfrtip', 
                         buttons = c('copy', 'csv', 'excel', 'pdf', 'print'), 
                         searchHighlight = TRUE),
          caption = "Filtered Codebook for HELIX Data")

Data Summary Exposures: Lifestyles

#Lifestyle 
filtered_codebook_lifestyles <- codebook %>%
  filter(domain == "Lifestyles" & period == "Pregnancy")
selectExposures <- filtered_codebook_lifestyles$variable_name
summarytools::view(dfSummary(exposome[,names(exposome) %in% selectExposures], 
                             style = 'grid', 
                             plain.ascii = FALSE, 
                             valid.col = FALSE, 
                             headings = FALSE), 
                   method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 e3_alcpreg_yn_None [factor]
1. 0
2. 1
896(68.9%)
405(31.1%)
0 (0.0%)
2 h_cereal_preg_Ter [factor]
1. (0,9]
2. (9,27.3]
3. (27.3,Inf]
531(40.8%)
459(35.3%)
311(23.9%)
0 (0.0%)
3 h_dairy_preg_Ter [factor]
1. (0,17.1]
2. (17.1,27.1]
3. (27.1,Inf]
270(20.8%)
380(29.2%)
651(50.0%)
0 (0.0%)
4 h_fastfood_preg_Ter [factor]
1. (0,0.25]
2. (0.25,0.83]
3. (0.83,Inf]
94(7.2%)
535(41.1%)
672(51.7%)
0 (0.0%)
5 h_fish_preg_Ter [factor]
1. (0,1.9]
2. (1.9,4.1]
3. (4.1,Inf]
343(26.4%)
490(37.7%)
468(36.0%)
0 (0.0%)
6 h_folic_t1_None [factor]
1. 0
2. 1
606(46.6%)
695(53.4%)
0 (0.0%)
7 h_fruit_preg_Ter [factor]
1. (0,0.6]
2. (0.6,18.2]
3. (18.2,Inf]
6(0.5%)
922(70.9%)
373(28.7%)
0 (0.0%)
8 h_legume_preg_Ter [factor]
1. (0,0.5]
2. (0.5,2]
3. (2,Inf]
245(18.8%)
269(20.7%)
787(60.5%)
0 (0.0%)
9 h_meat_preg_Ter [factor]
1. (0,6.5]
2. (6.5,10]
3. (10,Inf]
427(32.8%)
387(29.7%)
487(37.4%)
0 (0.0%)
10 h_pamod_t3_None [factor]
1. None
2. Often
3. Sometimes
4. Very Often
42(3.2%)
474(36.4%)
191(14.7%)
594(45.7%)
0 (0.0%)
11 h_pavig_t3_None [factor]
1. High
2. Low
3. Medium
47(3.6%)
952(73.2%)
302(23.2%)
0 (0.0%)
12 h_veg_preg_Ter [factor]
1. (0,8.8]
2. (8.8,16.5]
3. (16.5,Inf]
539(41.4%)
470(36.1%)
292(22.4%)
0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.3.1)
2024-07-02

Data Summary Exposures: Chemicals

#Chemical
filtered_codebook_chemicals <- codebook %>%
  filter(domain == "Chemicals" & 
         family == "Phthalates" & 
         period == "Pregnancy" & 
         variable_name != "hs_sumDEHP_madj_Log2")
selectExposures <- filtered_codebook_chemicals$variable_name
summarytools::view(dfSummary(exposome[,names(exposome) %in% selectExposures], 
                             style = 'grid', 
                             plain.ascii = FALSE, 
                             valid.col = FALSE, 
                             headings = FALSE), 
                   method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 hs_mbzp_madj_Log2 [numeric]
Mean (sd) : 3 (1.6)
min ≤ med ≤ max:
-3.7 ≤ 2.9 ≤ 9.3
IQR (CV) : 2.2 (0.5)
873 distinct values 0 (0.0%)
2 hs_mecpp_madj_Log2 [numeric]
Mean (sd) : 5 (1)
min ≤ med ≤ max:
2.4 ≤ 4.9 ≤ 10.4
IQR (CV) : 1.3 (0.2)
734 distinct values 0 (0.0%)
3 hs_mehhp_madj_Log2 [numeric]
Mean (sd) : 4.2 (1.2)
min ≤ med ≤ max:
-0.5 ≤ 4.1 ≤ 9.9
IQR (CV) : 1.3 (0.3)
890 distinct values 0 (0.0%)
4 hs_mehp_madj_Log2 [numeric]
Mean (sd) : 2.9 (1.4)
min ≤ med ≤ max:
-7.5 ≤ 3.1 ≤ 8.7
IQR (CV) : 2 (0.5)
873 distinct values 0 (0.0%)
5 hs_meohp_madj_Log2 [numeric]
Mean (sd) : 3.8 (1.2)
min ≤ med ≤ max:
0 ≤ 3.7 ≤ 9.6
IQR (CV) : 1.3 (0.3)
894 distinct values 0 (0.0%)
6 hs_mep_madj_Log2 [numeric]
Mean (sd) : 7.8 (1.8)
min ≤ med ≤ max:
3.3 ≤ 7.8 ≤ 14.1
IQR (CV) : 2.5 (0.2)
874 distinct values 0 (0.0%)
7 hs_mibp_madj_Log2 [numeric]
Mean (sd) : 5.3 (1.1)
min ≤ med ≤ max:
0.9 ≤ 5.3 ≤ 9.5
IQR (CV) : 1.3 (0.2)
871 distinct values 0 (0.0%)
8 hs_mnbp_madj_Log2 [numeric]
Mean (sd) : 5 (1.2)
min ≤ med ≤ max:
-0.7 ≤ 4.9 ≤ 12.7
IQR (CV) : 1.4 (0.2)
897 distinct values 0 (0.0%)
9 hs_ohminp_madj_Log2 [numeric]
Mean (sd) : -0.3 (1.5)
min ≤ med ≤ max:
-11.5 ≤ -0.2 ≤ 6.1
IQR (CV) : 1 (-5.1)
730 distinct values 0 (0.0%)
10 hs_oxominp_madj_Log2 [numeric]
Mean (sd) : -0.1 (1.6)
min ≤ med ≤ max:
-11.6 ≤ 0 ≤ 5.6
IQR (CV) : 1.2 (-28.7)
766 distinct values 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.3.1)
2024-07-02

Data Summary Exposures: Covariate

# Covariates 
filtered_codebook_covariates <- codebook %>%
  filter(variable_name %in% c("e3_sex_None", "h_cohort", "hs_child_age_None"))
selectCovariates <- filtered_codebook_covariates$variable_name
summarytools::view(dfSummary(covariates[,names(covariates) %in% selectCovariates], 
                             style = 'grid', 
                             plain.ascii = FALSE, 
                             valid.col = FALSE, 
                             headings = FALSE), 
                   method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 h_cohort [factor]
1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
202(15.5%)
198(15.2%)
224(17.2%)
207(15.9%)
272(20.9%)
198(15.2%)
0 (0.0%)
2 e3_sex_None [factor]
1. female
2. male
608(46.7%)
693(53.3%)
0 (0.0%)
3 hs_child_age_None [numeric]
Mean (sd) : 8 (1.6)
min ≤ med ≤ max:
5.4 ≤ 8 ≤ 12.1
IQR (CV) : 2.4 (0.2)
879 distinct values 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.3.1)
2024-07-02

Data Summary Exposures: Phenotype

#Figure this one out, it keeps changing.. should be 2 categories
variable_name <- "hs_bmi_c_cat"
variable_index <- which(names(phenotype) == variable_name)

summarytools::view(dfSummary(phenotype[, variable_index, drop = FALSE],
                             style = 'grid',
                             plain.ascii = FALSE,
                             valid.col = FALSE,
                             headings = FALSE),
                   method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 hs_bmi_c_cat [factor]
1. 1
2. 2
3. 3
4. 4
13(1.0%)
904(69.5%)
253(19.4%)
131(10.1%)
0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.3.1)
2024-07-02

Data Exploration

Combining covariates with phenotype (Full Data)

# Lifestyle variables
filtered_codebook_lifestyles <- codebook %>%
  filter(domain == "Lifestyles" & period == "Pregnancy")
selectExposures_lifestyle <- filtered_codebook_lifestyles$variable_name

# Chemical variables
filtered_codebook_chemicals <- codebook %>%
  filter(domain == "Chemicals" & family == "Phthalates" & period == "Pregnancy" & variable_name != "hs_sumDEHP_madj_Log2")
selectExposures_chemicals <- filtered_codebook_chemicals$variable_name

# Covariate variables
filtered_codebook_covariates <- codebook %>%
  filter(variable_name %in% c("e3_sex_None", "h_cohort", "hs_child_age_None"))
selectCovariates <- filtered_codebook_covariates$variable_name

# Phenotype variables
filtered_codebook_phenotype <- codebook %>%
  filter(domain == "Phenotype" & variable_name %in% c("hs_bmi_c_cat"))
selectPhenotypes <- filtered_codebook_phenotype$variable_name
all_selected_variables <- c("ID", selectExposures_lifestyle, selectExposures_chemicals, selectCovariates, selectPhenotypes, "age")

# Subset the data
subset_exposome <- exposome %>% dplyr::select(all_of(selectExposures_lifestyle), all_of(selectExposures_chemicals))
subset_covariates <- covariates %>% dplyr::select(all_of(selectCovariates))
subset_phenotype <- phenotype %>% dplyr::select(all_of(selectPhenotypes))

# Final Merge
exposome_phenotype_covariates <- exposome %>%
  dplyr::select(ID, all_of(selectExposures_lifestyle), all_of(selectExposures_chemicals)) %>%
  left_join(covariates %>% dplyr::select(ID, all_of(selectCovariates)), by = "ID") %>%
  left_join(phenotype %>% dplyr::select(ID, all_of(selectPhenotypes)), by = "ID")

# Binning BMI
exposome_phenotype_covariates <- exposome_phenotype_covariates %>%
  mutate(hs_bmi_c_cat = ifelse(hs_bmi_c_cat %in% c(1, 2), 0, 
                               ifelse(hs_bmi_c_cat %in% c(3, 4), 1, hs_bmi_c_cat)))

exposome_phenotype_covariates

Table by Covariates: Sex, Cohort, Age

exposome_phenotype_covariates <- exposome_phenotype_covariates %>%
  mutate(
    e3_sex_None = factor(e3_sex_None),
    h_cohort = factor(h_cohort),
    hs_bmi_c_cat = factor(hs_bmi_c_cat))

# Labels
label(exposome_phenotype_covariates$e3_sex_None) <- "Sex"
label(exposome_phenotype_covariates$h_cohort) <- "Cohort"
label(exposome_phenotype_covariates$hs_child_age_None) <- "Child's Age"
label(exposome_phenotype_covariates$e3_alcpreg_yn_None) <- "Alcohol During Pregnancy"
label(exposome_phenotype_covariates$h_cereal_preg_Ter) <- "Cereal Intake During Pregnancy"
label(exposome_phenotype_covariates$h_dairy_preg_Ter) <- "Dairy Intake During Pregnancy"
label(exposome_phenotype_covariates$h_fastfood_preg_Ter) <- "Fast Food Intake During Pregnancy"
label(exposome_phenotype_covariates$h_fish_preg_Ter) <- "Fish Intake During Pregnancy"
label(exposome_phenotype_covariates$h_folic_t1_None) <- "Folic Acid Intake"
label(exposome_phenotype_covariates$h_fruit_preg_Ter) <- "Fruit Intake During Pregnancy"
label(exposome_phenotype_covariates$h_legume_preg_Ter) <- "Legume Intake During Pregnancy"
label(exposome_phenotype_covariates$h_meat_preg_Ter) <- "Meat Intake During Pregnancy"
label(exposome_phenotype_covariates$h_pamod_t3_None) <- "Physical Activity (Moderate)"
label(exposome_phenotype_covariates$h_pavig_t3_None) <- "Physical Activity (Vigorous)"
label(exposome_phenotype_covariates$h_veg_preg_Ter) <- "Vegetable Intake During Pregnancy"
label(exposome_phenotype_covariates$hs_mbzp_madj_Log2) <- "MBzP (Log2)"
label(exposome_phenotype_covariates$hs_mecpp_madj_Log2) <- "MECPP (Log2)"
label(exposome_phenotype_covariates$hs_mehhp_madj_Log2) <- "MEHHP (Log2)"
label(exposome_phenotype_covariates$hs_mehp_madj_Log2) <- "MEHP (Log2)"
label(exposome_phenotype_covariates$hs_meohp_madj_Log2) <- "MEOHP (Log2)"
label(exposome_phenotype_covariates$hs_mep_madj_Log2) <- "MEP (Log2)"
label(exposome_phenotype_covariates$hs_mibp_madj_Log2) <- "MiBP (Log2)"
label(exposome_phenotype_covariates$hs_mnbp_madj_Log2) <- "MnBP (Log2)"
label(exposome_phenotype_covariates$hs_ohminp_madj_Log2) <- "OH-MiNP (Log2)"
label(exposome_phenotype_covariates$hs_oxominp_madj_Log2) <- "OXO-MiNP (Log2)"
label(exposome_phenotype_covariates$hs_bmi_c_cat) <- "BMI Category"

# continuous variables
render_cont <- function(x) {
  sprintf("%.2f (%.2f)", mean(x, na.rm = TRUE), sd(x, na.rm = TRUE))
}

# categorical variables
render_cat <- function(x) {
  paste0(names(table(x)), " (", table(x), ")", collapse = ", ")
}

# Make sure its 27 columns
columns <- c("age", "hs_child_age_None", "e3_alcpreg_yn_None", "h_cereal_preg_Ter",
             "h_dairy_preg_Ter", "h_fastfood_preg_Ter", "h_fish_preg_Ter", "h_folic_t1_None",
             "h_fruit_preg_Ter", "h_legume_preg_Ter", "h_meat_preg_Ter", "h_pamod_t3_None",
             "h_pavig_t3_None", "h_veg_preg_Ter", "hs_mbzp_madj_Log2", "hs_mecpp_madj_Log2",
             "hs_mehhp_madj_Log2", "hs_mehp_madj_Log2", "hs_meohp_madj_Log2", "hs_mep_madj_Log2",
             "hs_mibp_madj_Log2", "hs_mnbp_madj_Log2", "hs_ohminp_madj_Log2", "hs_oxominp_madj_Log2")

# stratified by cohort
table1(~  hs_child_age_None  + e3_alcpreg_yn_None + h_cereal_preg_Ter + h_dairy_preg_Ter +
         h_fastfood_preg_Ter + h_fish_preg_Ter + h_folic_t1_None + h_fruit_preg_Ter +
         h_legume_preg_Ter + h_meat_preg_Ter + h_pamod_t3_None + h_pavig_t3_None +
         h_veg_preg_Ter + hs_mbzp_madj_Log2 + hs_mecpp_madj_Log2 + hs_mehhp_madj_Log2 +
         hs_mehp_madj_Log2 + hs_meohp_madj_Log2 + hs_mep_madj_Log2 + hs_mibp_madj_Log2 +
         hs_mnbp_madj_Log2 + hs_ohminp_madj_Log2 + hs_oxominp_madj_Log2 | h_cohort,
       data = exposome_phenotype_covariates,
       render.continuous = render_cont, render.categorical = render_cat,
       overall = TRUE, topclass = "Rtable1-shade")
1
(N=202)
2
(N=198)
3
(N=224)
4
(N=207)
5
(N=272)
6
(N=198)
TRUE
(N=1301)
Child's Age 6.61 (0.28) 10.82 (0.58) 8.78 (0.58) 6.48 (0.47) 8.46 (0.53) 6.51 (0.30) 7.98 (1.61)
Alcohol During Pregnancy 0 (118), 1 (84) 0 (128), 1 (70) 0 (186), 1 (38) 0 (186), 1 (21) 0 (144), 1 (128) 0 (134), 1 (64) 0 (896), 1 (405)
Cereal Intake During Pregnancy (0,9] (131), (9,27.3] (64), (27.3,Inf] (7) (0,9] (4), (9,27.3] (114), (27.3,Inf] (80) (0,9] (200), (9,27.3] (24), (27.3,Inf] (0) (0,9] (119), (9,27.3] (87), (27.3,Inf] (1) (0,9] (10), (9,27.3] (64), (27.3,Inf] (198) (0,9] (67), (9,27.3] (106), (27.3,Inf] (25) (0,9] (531), (9,27.3] (459), (27.3,Inf] (311)
Dairy Intake During Pregnancy (0,17.1] (0), (17.1,27.1] (49), (27.1,Inf] (153) (0,17.1] (30), (17.1,27.1] (66), (27.1,Inf] (102) (0,17.1] (101), (17.1,27.1] (85), (27.1,Inf] (38) (0,17.1] (0), (17.1,27.1] (43), (27.1,Inf] (164) (0,17.1] (74), (17.1,27.1] (72), (27.1,Inf] (126) (0,17.1] (65), (17.1,27.1] (65), (27.1,Inf] (68) (0,17.1] (270), (17.1,27.1] (380), (27.1,Inf] (651)
Fast Food Intake During Pregnancy (0,0.25] (0), (0.25,0.83] (81), (0.83,Inf] (121) (0,0.25] (5), (0.25,0.83] (25), (0.83,Inf] (168) (0,0.25] (0), (0.25,0.83] (80), (0.83,Inf] (144) (0,0.25] (1), (0.25,0.83] (87), (0.83,Inf] (119) (0,0.25] (88), (0.25,0.83] (170), (0.83,Inf] (14) (0,0.25] (0), (0.25,0.83] (92), (0.83,Inf] (106) (0,0.25] (94), (0.25,0.83] (535), (0.83,Inf] (672)
Fish Intake During Pregnancy (0,1.9] (19), (1.9,4.1] (105), (4.1,Inf] (78) (0,1.9] (107), (1.9,4.1] (62), (4.1,Inf] (29) (0,1.9] (14), (1.9,4.1] (81), (4.1,Inf] (129) (0,1.9] (44), (1.9,4.1] (82), (4.1,Inf] (81) (0,1.9] (45), (1.9,4.1] (98), (4.1,Inf] (129) (0,1.9] (114), (1.9,4.1] (62), (4.1,Inf] (22) (0,1.9] (343), (1.9,4.1] (490), (4.1,Inf] (468)
Folic Acid Intake 0 (90), 1 (112) 0 (182), 1 (16) 0 (41), 1 (183) 0 (107), 1 (100) 0 (171), 1 (101) 0 (15), 1 (183) 0 (606), 1 (695)
Fruit Intake During Pregnancy (0,0.6] (0), (0.6,18.2] (166), (18.2,Inf] (36) (0,0.6] (0), (0.6,18.2] (147), (18.2,Inf] (51) (0,0.6] (0), (0.6,18.2] (115), (18.2,Inf] (109) (0,0.6] (0), (0.6,18.2] (173), (18.2,Inf] (34) (0,0.6] (0), (0.6,18.2] (190), (18.2,Inf] (82) (0,0.6] (6), (0.6,18.2] (131), (18.2,Inf] (61) (0,0.6] (6), (0.6,18.2] (922), (18.2,Inf] (373)
Legume Intake During Pregnancy (0,0.5] (0), (0.5,2] (0), (2,Inf] (202) (0,0.5] (1), (0.5,2] (20), (2,Inf] (177) (0,0.5] (8), (0.5,2] (138), (2,Inf] (78) (0,0.5] (0), (0.5,2] (1), (2,Inf] (206) (0,0.5] (226), (0.5,2] (34), (2,Inf] (12) (0,0.5] (10), (0.5,2] (76), (2,Inf] (112) (0,0.5] (245), (0.5,2] (269), (2,Inf] (787)
Meat Intake During Pregnancy (0,6.5] (54), (6.5,10] (46), (10,Inf] (102) (0,6.5] (91), (6.5,10] (67), (10,Inf] (40) (0,6.5] (47), (6.5,10] (117), (10,Inf] (60) (0,6.5] (73), (6.5,10] (35), (10,Inf] (99) (0,6.5] (65), (6.5,10] (66), (10,Inf] (141) (0,6.5] (97), (6.5,10] (56), (10,Inf] (45) (0,6.5] (427), (6.5,10] (387), (10,Inf] (487)
Physical Activity (Moderate) None (5), Often (69), Sometimes (41), Very Often (87) None (3), Often (80), Sometimes (28), Very Often (87) None (11), Often (74), Sometimes (42), Very Often (97) None (7), Often (75), Sometimes (20), Very Often (105) None (6), Often (108), Sometimes (36), Very Often (122) None (10), Often (68), Sometimes (24), Very Often (96) None (42), Often (474), Sometimes (191), Very Often (594)
Physical Activity (Vigorous) High (10), Low (151), Medium (41) High (11), Low (137), Medium (50) High (7), Low (172), Medium (45) High (5), Low (161), Medium (41) High (9), Low (183), Medium (80) High (5), Low (148), Medium (45) High (47), Low (952), Medium (302)
Vegetable Intake During Pregnancy (0,8.8] (124), (8.8,16.5] (78), (16.5,Inf] (0) (0,8.8] (106), (8.8,16.5] (71), (16.5,Inf] (21) (0,8.8] (26), (8.8,16.5] (100), (16.5,Inf] (98) (0,8.8] (113), (8.8,16.5] (93), (16.5,Inf] (1) (0,8.8] (129), (8.8,16.5] (92), (16.5,Inf] (51) (0,8.8] (41), (8.8,16.5] (36), (16.5,Inf] (121) (0,8.8] (539), (8.8,16.5] (470), (16.5,Inf] (292)
MBzP (Log2) 1.90 (1.29) 4.55 (1.38) 3.25 (1.52) 3.13 (1.44) 2.65 (1.23) 2.48 (1.50) 2.98 (1.60)
MECPP (Log2) 4.88 (0.93) 5.52 (1.08) 4.83 (0.78) 4.64 (0.72) 5.17 (0.95) 5.11 (1.13) 5.03 (0.98)
MEHHP (Log2) 3.38 (1.08) 4.97 (1.19) 4.43 (1.24) 3.63 (0.82) 4.23 (1.11) 4.26 (1.37) 4.16 (1.25)
MEHP (Log2) 2.01 (1.27) 3.41 (1.25) 3.10 (1.53) 2.31 (1.21) 3.73 (1.05) 2.82 (1.44) 2.94 (1.43)
MEOHP (Log2) 3.19 (1.09) 4.48 (1.15) 4.18 (1.20) 3.35 (0.91) 3.84 (1.12) 3.59 (1.35) 3.78 (1.22)
MEP (Log2) 7.81 (1.78) 7.13 (1.67) 8.56 (1.59) 8.60 (1.12) 7.24 (2.03) 7.34 (1.85) 7.77 (1.82)
MiBP (Log2) 5.26 (0.97) 5.84 (1.04) 4.91 (1.10) 5.45 (0.67) 5.13 (1.12) 5.39 (1.07) 5.31 (1.05)
MnBP (Log2) 4.56 (1.08) 5.70 (1.14) 4.82 (1.29) 5.11 (1.01) 4.99 (1.06) 4.57 (1.33) 4.96 (1.21)
OH-MiNP (Log2) -0.53 (1.30) -0.64 (1.46) -0.20 (0.68) -0.46 (0.91) 0.43 (1.45) -0.67 (2.48) -0.30 (1.53)
OXO-MiNP (Log2) 0.14 (1.17) -0.45 (1.64) -0.07 (0.95) -0.28 (0.68) 0.41 (1.86) -0.25 (2.39) -0.06 (1.59)
# stratified by sex
table1(~  hs_child_age_None  + e3_alcpreg_yn_None + h_cereal_preg_Ter + h_dairy_preg_Ter +
         h_fastfood_preg_Ter + h_fish_preg_Ter + h_folic_t1_None + h_fruit_preg_Ter +
         h_legume_preg_Ter + h_meat_preg_Ter + h_pamod_t3_None + h_pavig_t3_None +
         h_veg_preg_Ter + hs_mbzp_madj_Log2 + hs_mecpp_madj_Log2 + hs_mehhp_madj_Log2 +
         hs_mehp_madj_Log2 + hs_meohp_madj_Log2 + hs_mep_madj_Log2 + hs_mibp_madj_Log2 +
         hs_mnbp_madj_Log2 + hs_ohminp_madj_Log2 + hs_oxominp_madj_Log2 | e3_sex_None,
       data = exposome_phenotype_covariates,
       render.continuous = render_cont, render.categorical = render_cat,
       overall = TRUE, topclass = "Rtable1-shade")
female
(N=608)
male
(N=693)
TRUE
(N=1301)
Child's Age 7.91 (1.58) 8.03 (1.64) 7.98 (1.61)
Alcohol During Pregnancy 0 (437), 1 (171) 0 (459), 1 (234) 0 (896), 1 (405)
Cereal Intake During Pregnancy (0,9] (246), (9,27.3] (213), (27.3,Inf] (149) (0,9] (285), (9,27.3] (246), (27.3,Inf] (162) (0,9] (531), (9,27.3] (459), (27.3,Inf] (311)
Dairy Intake During Pregnancy (0,17.1] (124), (17.1,27.1] (171), (27.1,Inf] (313) (0,17.1] (146), (17.1,27.1] (209), (27.1,Inf] (338) (0,17.1] (270), (17.1,27.1] (380), (27.1,Inf] (651)
Fast Food Intake During Pregnancy (0,0.25] (50), (0.25,0.83] (248), (0.83,Inf] (310) (0,0.25] (44), (0.25,0.83] (287), (0.83,Inf] (362) (0,0.25] (94), (0.25,0.83] (535), (0.83,Inf] (672)
Fish Intake During Pregnancy (0,1.9] (145), (1.9,4.1] (238), (4.1,Inf] (225) (0,1.9] (198), (1.9,4.1] (252), (4.1,Inf] (243) (0,1.9] (343), (1.9,4.1] (490), (4.1,Inf] (468)
Folic Acid Intake 0 (279), 1 (329) 0 (327), 1 (366) 0 (606), 1 (695)
Fruit Intake During Pregnancy (0,0.6] (3), (0.6,18.2] (425), (18.2,Inf] (180) (0,0.6] (3), (0.6,18.2] (497), (18.2,Inf] (193) (0,0.6] (6), (0.6,18.2] (922), (18.2,Inf] (373)
Legume Intake During Pregnancy (0,0.5] (116), (0.5,2] (124), (2,Inf] (368) (0,0.5] (129), (0.5,2] (145), (2,Inf] (419) (0,0.5] (245), (0.5,2] (269), (2,Inf] (787)
Meat Intake During Pregnancy (0,6.5] (187), (6.5,10] (186), (10,Inf] (235) (0,6.5] (240), (6.5,10] (201), (10,Inf] (252) (0,6.5] (427), (6.5,10] (387), (10,Inf] (487)
Physical Activity (Moderate) None (17), Often (213), Sometimes (95), Very Often (283) None (25), Often (261), Sometimes (96), Very Often (311) None (42), Often (474), Sometimes (191), Very Often (594)
Physical Activity (Vigorous) High (22), Low (449), Medium (137) High (25), Low (503), Medium (165) High (47), Low (952), Medium (302)
Vegetable Intake During Pregnancy (0,8.8] (258), (8.8,16.5] (214), (16.5,Inf] (136) (0,8.8] (281), (8.8,16.5] (256), (16.5,Inf] (156) (0,8.8] (539), (8.8,16.5] (470), (16.5,Inf] (292)
MBzP (Log2) 2.91 (1.54) 3.04 (1.64) 2.98 (1.60)
MECPP (Log2) 5.01 (1.01) 5.05 (0.95) 5.03 (0.98)
MEHHP (Log2) 4.15 (1.25) 4.16 (1.25) 4.16 (1.25)
MEHP (Log2) 2.96 (1.38) 2.92 (1.47) 2.94 (1.43)
MEOHP (Log2) 3.79 (1.22) 3.78 (1.22) 3.78 (1.22)
MEP (Log2) 7.81 (1.87) 7.74 (1.77) 7.77 (1.82)
MiBP (Log2) 5.30 (1.02) 5.32 (1.08) 5.31 (1.05)
MnBP (Log2) 4.95 (1.22) 4.97 (1.20) 4.96 (1.21)
OH-MiNP (Log2) -0.29 (1.63) -0.31 (1.43) -0.30 (1.53)
OXO-MiNP (Log2) 0.08 (1.45) -0.18 (1.69) -0.06 (1.59)

Should I do any additional tihngs with th outcome variable?

numeric_vars <- exposome_phenotype_covariates %>%
  dplyr::select(where(is.numeric))

cor_matrix <- cor(numeric_vars, use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black", tl.srt = 45)

find_highly_correlated <- function(cor_matrix, threshold = 0.8) {
  cor_matrix[lower.tri(cor_matrix, diag = TRUE)] <- NA  
  cor_matrix <- as.data.frame(as.table(cor_matrix)) 
  cor_matrix <- na.omit(cor_matrix)  # Remove NA values
  cor_matrix <- cor_matrix[order(-abs(cor_matrix$Freq)), ]  
  cor_matrix <- cor_matrix %>% filter(abs(Freq) > threshold)  
  return(cor_matrix)
}
#  threshold of 0.60
highly_correlated_pairs <- find_highly_correlated(cor_matrix, threshold = 0.60)
highly_correlated_pairs

#Correlated Variables on the outcome

# Top 1 
ggplot(exposome_phenotype_covariates, aes(x = hs_bmi_c_cat, y = hs_mehhp_madj_Log2)) +
  geom_boxplot(outlier.shape = NA) + 
  labs(title = "Boxplot and Jitter Plot of hs_mehhp_madj_Log2 by BMI Category",
       x = "BMI Category",
       y = "hs_mehhp_madj_Log2") +
  theme_minimal()

#Top 2 
ggplot(exposome_phenotype_covariates, aes(x = hs_bmi_c_cat, y = hs_mecpp_madj_Log2)) +
  geom_boxplot(outlier.shape = NA) + 
  labs(title = "Boxplot and Jitter Plot of hs_mecpp_madj_Log2 by BMI Category",
       x = "BMI Category",
       y = "hs_mecpp_madj_Log2") +
  theme_minimal()

variables <- c("e3_alcpreg_yn_None", "h_cereal_preg_Ter", "h_dairy_preg_Ter", "h_fastfood_preg_Ter",
               "h_fish_preg_Ter", "h_folic_t1_None", "h_fruit_preg_Ter", "h_legume_preg_Ter",
               "h_meat_preg_Ter", "h_pamod_t3_None", "h_pavig_t3_None", "h_veg_preg_Ter",
               "hs_mbzp_madj_Log2", "hs_mecpp_madj_Log2", "hs_mehhp_madj_Log2", "hs_mehp_madj_Log2",
               "hs_meohp_madj_Log2", "hs_mep_madj_Log2", "hs_mibp_madj_Log2", "hs_mnbp_madj_Log2",
               "hs_ohminp_madj_Log2", "hs_oxominp_madj_Log2", "e3_sex_None", "h_cohort",
               "hs_child_age_None")

plot_against_bmi_cat <- function(variable) {
  if (is.numeric(exposome_phenotype_covariates[[variable]])) {
    ggplot(exposome_phenotype_covariates, aes_string(x = "hs_bmi_c_cat", y = variable)) +
      geom_boxplot(outlier.shape = NA) +
      geom_jitter(width = 0.2, height = 0, alpha = 0.5, color = "blue") +
      labs(title = paste("Boxplot and Jitter Plot of", variable, "by hs_bmi_c_cat"),
           x = "BMI Category",
           y = variable) +
      theme_minimal()
  } else {
    ggplot(exposome_phenotype_covariates, aes_string(x = "hs_bmi_c_cat", fill = variable)) +
      geom_bar(position = "dodge") +
      labs(title = paste("Bar Plot of", variable, "by hs_bmi_c_cat"),
           x = "BMI Category",
           y = "Count") +
      theme_minimal()
  }
}

plots <- lapply(variables, plot_against_bmi_cat)
for (plot in plots) {
  print(plot)
}

Data Modeling

Lasso Ridge Regression (For Log) BMI

0- Undereweight and Normal 1- Overweight and Obese

Model 1: Grouped Lasso Logistic Regression (With Important Variables) without Metabolomics Data

Finding the most important variables:

#Sub Models:

Data Prep

exposome_phenotype_covariates <- exposome_phenotype_covariates %>%
  mutate(across(c("h_cereal_preg_Ter", "h_dairy_preg_Ter", "h_fastfood_preg_Ter",
                  "h_fish_preg_Ter", "h_folic_t1_None", "h_fruit_preg_Ter", 
                  "h_legume_preg_Ter", "h_meat_preg_Ter", "h_veg_preg_Ter",
                  "hs_mbzp_madj_Log2", "hs_mecpp_madj_Log2", "hs_mehhp_madj_Log2",
                  "hs_mehp_madj_Log2", "hs_meohp_madj_Log2", "hs_mep_madj_Log2", 
                  "hs_mibp_madj_Log2", "hs_mnbp_madj_Log2", "hs_ohminp_madj_Log2",
                  "hs_oxominp_madj_Log2", "e3_alcpreg_yn_None", "e3_sex_None", 
                  "h_cohort", "hs_child_age_None"), as.numeric))


X_dietary <- as.matrix(exposome_phenotype_covariates[, c("h_cereal_preg_Ter", "h_dairy_preg_Ter", "h_fastfood_preg_Ter", "h_fish_preg_Ter", "h_folic_t1_None", "h_fruit_preg_Ter", 
"h_legume_preg_Ter", "h_meat_preg_Ter", "h_veg_preg_Ter")])

X_phthalates <- as.matrix(exposome_phenotype_covariates[, c("hs_mbzp_madj_Log2", "hs_mecpp_madj_Log2", "hs_mehhp_madj_Log2", "hs_mehp_madj_Log2", "hs_meohp_madj_Log2", "hs_mep_madj_Log2", "hs_mibp_madj_Log2", "hs_mnbp_madj_Log2", "hs_ohminp_madj_Log2", "hs_oxominp_madj_Log2")])

X_combined <- as.matrix(exposome_phenotype_covariates[, c("h_cereal_preg_Ter", "h_dairy_preg_Ter", "h_fastfood_preg_Ter", "h_fish_preg_Ter", "h_folic_t1_None", "h_fruit_preg_Ter", "h_legume_preg_Ter", "h_meat_preg_Ter", "h_veg_preg_Ter",
"hs_mbzp_madj_Log2", "hs_mecpp_madj_Log2", "hs_mehhp_madj_Log2", "hs_mehp_madj_Log2", "hs_meohp_madj_Log2", "hs_mep_madj_Log2", "hs_mibp_madj_Log2", "hs_mnbp_madj_Log2", "hs_ohminp_madj_Log2", "hs_oxominp_madj_Log2", "e3_alcpreg_yn_None", "e3_sex_None", "h_cohort", "hs_child_age_None")])

y <- as.numeric(as.character(exposome_phenotype_covariates$hs_bmi_c_cat))

Fit Grouped Lasso

# Fit Grouped Lasso models
model_dietary <- grpreg(X_dietary, y, group = rep(1:9, each = 1), family = "binomial")
model_phthalates <- grpreg(X_phthalates, y, group = rep(1:10, each = 1), family = "binomial")
model_combined <- grpreg(X_combined, y, group = c(rep(1:9, each = 1), rep(10:19, each = 1), rep(20:23, each = 1)), family = "binomial")

CV to get the best Lambda

cv_dietary <- cv.grpreg(X_dietary, y, group = rep(1:9, each = 1), family = "binomial", nfolds = 10)
cv_phthalates <- cv.grpreg(X_phthalates, y, group = rep(1:10, each = 1), family = "binomial", nfolds = 10)
cv_combined <- cv.grpreg(X_combined, y, group = c(rep(1:9, each = 1), rep(10:19, each = 1), rep(20:23, each = 1)), family = "binomial", nfolds = 10)

best_lambda_dietary <- cv_dietary$lambda.min
best_lambda_phthalates <- cv_phthalates$lambda.min
best_lambda_combined <- cv_combined$lambda.min

cat("Best lambda for dietary model:", best_lambda_dietary, "\n")
## Best lambda for dietary model: 0.007656545
cat("Best lambda for phthalates model:", best_lambda_phthalates, "\n")
## Best lambda for phthalates model: 0.004814766
cat("Best lambda for combined model:", best_lambda_combined, "\n")
## Best lambda for combined model: 0.01110834
plot(cv_dietary, main = "Cross-validation for Dietary Model")

plot(cv_phthalates, main = "Cross-validation for Phthalates Model")

plot(cv_combined, main = "Cross-validation for Combined Model")

Train Model and Calc Accuracy:

model_dietary_final <- grpreg(X_dietary, y, group = rep(1:9, each = 1), family = "binomial", lambda = best_lambda_dietary)
model_phthalates_final <- grpreg(X_phthalates, y, group = rep(1:10, each = 1), family = "binomial", lambda = best_lambda_phthalates)
model_combined_final <- grpreg(X_combined, y, group = c(rep(1:9, each = 1), rep(10:19, each = 1), rep(20:23, each = 1)), family = "binomial", lambda = best_lambda_combined)

pred_dietary <- predict(model_dietary_final, X_dietary, type = "response") > 0.5
pred_phthalates <- predict(model_phthalates_final, X_phthalates, type = "response") > 0.5
pred_combined <- predict(model_combined_final, X_combined, type = "response") > 0.5

accuracy_dietary <- mean(pred_dietary == y)
accuracy_phthalates <- mean(pred_phthalates == y)
accuracy_combined <- mean(pred_combined == y)

cat("Accuracy for dietary model:", accuracy_dietary, "\n")
## Accuracy for dietary model: 0.7048424
cat("Accuracy for phthalates model:", accuracy_phthalates, "\n")
## Accuracy for phthalates model: 0.7056111
cat("Accuracy for combined model:", accuracy_combined, "\n")
## Accuracy for combined model: 0.7056111

Variable Importnace:

# Extracting non-zero coef
extract_nonzero_coefficients <- function(model, variable_names) {
  coefs <- coef(model)
  nonzero_indices <- which(coefs != 0)
  nonzero_coefs <- coefs[nonzero_indices]
  nonzero_variable_names <- c("(Intercept)", variable_names)[nonzero_indices]
  return(data.frame(Variable = nonzero_variable_names, Coefficient = nonzero_coefs))
}

# dietary model
dietary_variable_names <- c("h_cereal_preg_Ter", "h_dairy_preg_Ter", "h_fastfood_preg_Ter", 
                            "h_fish_preg_Ter", "h_folic_t1_None", "h_fruit_preg_Ter", 
                            "h_legume_preg_Ter", "h_meat_preg_Ter", "h_veg_preg_Ter")

#  phthalates model
phthalates_variable_names <- c("hs_mbzp_madj_Log2", "hs_mecpp_madj_Log2", "hs_mehhp_madj_Log2", 
                               "hs_mehp_madj_Log2", "hs_meohp_madj_Log2", "hs_mep_madj_Log2", 
                               "hs_mibp_madj_Log2", "hs_mnbp_madj_Log2", "hs_ohminp_madj_Log2", 
                               "hs_oxominp_madj_Log2")

# combined model
combined_variable_names <- c("h_cereal_preg_Ter", "h_dairy_preg_Ter", "h_fastfood_preg_Ter", 
                             "h_fish_preg_Ter", "h_folic_t1_None", "h_fruit_preg_Ter", 
                             "h_legume_preg_Ter", "h_meat_preg_Ter", "h_veg_preg_Ter")
important_variables_dietary <- extract_nonzero_coefficients(model_dietary_final, dietary_variable_names)
important_variables_phthalates <- extract_nonzero_coefficients(model_phthalates_final, phthalates_variable_names)
important_variables_combined <- extract_nonzero_coefficients(model_combined_final, combined_variable_names)

cat("Important variables for the dietary model:\n")
## Important variables for the dietary model:
print(important_variables_dietary)
##                                Variable Coefficient
## (Intercept)                 (Intercept) -0.19488583
## h_cereal_preg_Ter     h_cereal_preg_Ter -0.38179702
## h_dairy_preg_Ter       h_dairy_preg_Ter -0.13157615
## h_fastfood_preg_Ter h_fastfood_preg_Ter  0.10915252
## h_folic_t1_None         h_folic_t1_None  0.00669499
## h_fruit_preg_Ter       h_fruit_preg_Ter  0.13780766
## h_meat_preg_Ter         h_meat_preg_Ter -0.14433855
cat("\nImportant variables for the phthalates model:\n")
## 
## Important variables for the phthalates model:
print(important_variables_phthalates)
##                                  Variable Coefficient
## (Intercept)                   (Intercept) -1.48406784
## hs_mbzp_madj_Log2       hs_mbzp_madj_Log2  0.05691807
## hs_mecpp_madj_Log2     hs_mecpp_madj_Log2  0.01319344
## hs_mehhp_madj_Log2     hs_mehhp_madj_Log2  0.04107015
## hs_mehp_madj_Log2       hs_mehp_madj_Log2 -0.02858354
## hs_mep_madj_Log2         hs_mep_madj_Log2  0.06723521
## hs_mibp_madj_Log2       hs_mibp_madj_Log2  0.01526384
## hs_mnbp_madj_Log2       hs_mnbp_madj_Log2 -0.06517443
## hs_oxominp_madj_Log2 hs_oxominp_madj_Log2 -0.06543643
cat("\nImportant variables for the combined model:\n")
## 
## Important variables for the combined model:
print(important_variables_combined)
##                                 Variable Coefficient
## (Intercept)                  (Intercept) -0.45477470
## h_cereal_preg_Ter      h_cereal_preg_Ter -0.37125320
## h_dairy_preg_Ter        h_dairy_preg_Ter -0.09439318
## h_fastfood_preg_Ter  h_fastfood_preg_Ter  0.09335770
## h_fruit_preg_Ter        h_fruit_preg_Ter  0.10208311
## h_meat_preg_Ter          h_meat_preg_Ter -0.11378283
## hs_mbzp_madj_Log2                   <NA>  0.02617643
## hs_mecpp_madj_Log2                  <NA>  0.01461065
## hs_mep_madj_Log2                    <NA>  0.01108850
## hs_oxominp_madj_Log2                <NA> -0.04033985
## e3_alcpreg_yn_None                  <NA> -0.13649722
## h_cohort                            <NA>  0.04592868

Model 2: Grouped Lasso Full Model + Metabolomics Data

Model 3: Best MOdel + Metabolomics Data

#exposome_phenotype_covariates 
#metabol_serum.d
#Attaching Serium 

https://stat.ethz.ch/Manuscripts/buhlmann/logistic-grouplasso-final.pdf https://cran.r-project.org/web/packages/grplasso/grplasso.pdf